The Project

  1. This is a project with minimal scaffolding. Expect to use the the discussion forums to gain insights! It’s not cheating to ask others for opinions or perspectives!
  2. Be inquisitive, try out new things.
  3. Use the previous modules for insights into how to complete the functions! You'll have to combine Pillow, OpenCV, and Pytesseract
  4. There are hints provided in Coursera, feel free to explore the hints if needed. Each hint provide progressively more details on how to solve the issue. This project is intended to be comprehensive and difficult if you do it without the hints.

The Assignment

Take a ZIP file) of images and process them, using a library built into python that you need to learn how to use. A ZIP file takes several different files and compresses them, thus saving space, into one single file. The files in the ZIP file we provide are newspaper images (like you saw in week 3). Your task is to write python code which allows one to search through the images looking for the occurrences of keywords and faces. E.g. if you search for "pizza" it will return a contact sheet of all of the faces which were located on the newspaper page which mentions "pizza". This will test your ability to learn a new (library), your ability to use OpenCV to detect faces, your ability to use tesseract to do optical character recognition, and your ability to use PIL to composite images together into contact sheets.

Each page of the newspapers is saved as a single PNG image in a file called images.zip. These newspapers are in english, and contain a variety of stories, advertisements and images. Note: This file is fairly large (~200 MB) and may take some time to work with, I would encourage you to use small_img.zip for testing.

Here's an example of the output expected. Using the small_img.zip file, if I search for the string "Christopher" I should see the following image: Christopher Search If I were to use the images.zip file and search for "Mark" I should see the following image (note that there are times when there are no faces on a page, but a word is found!): Mark Search

Note: That big file can take some time to process - for me it took nearly ten minutes! Use the small one for testing.

In [2]:
import zipfile

from PIL import Image
import pytesseract
import cv2 as cv
import numpy as np

# loading the face detection classifier
face_cascade = cv.CascadeClassifier('readonly/haarcascade_frontalface_default.xml')

# load the zip file for training data in training folder
with zipfile.ZipFile('./readonly/images.zip', 'r') as zip_ref:
    zip_ref.extractall('training')

#training data loaded in ./training folder
#testing data to be loaded similarly in testing folder
with zipfile.ZipFile('./readonly/small_img.zip', 'r') as zip_ref:
    zip_ref.extractall('testing')
In [3]:
#get the file names or paths
training_list=zipfile.ZipFile("./readonly/images.zip").namelist()
testing_list=zipfile.ZipFile("./readonly/small_img.zip").namelist()
#test wether it works
Image.open("training/"+training_list[0])
Out[3]:
In [4]:
#now we use pyteseract to recognize the words and store it as a string
def text_extract(name,path):
    img = Image.open(path+name)
    img = img.convert('L')
    # Now lets save that image
    # And run OCR on the greyscale image
    text = pytesseract.image_to_string(img) 
    return text    
In [24]:
from PIL import ImageDraw
cv_img1=cv.imread("training/"+"a-0.png")
face1 = faces = face_cascade.detectMultiScale(cv_img1,1.35)
def show_rects(faces):
    #Lets read in our gif and convert it
    pil_img=Image.open("training/"+"a-0.png").convert("RGB")
    # Set our drawing context
    drawing=ImageDraw.Draw(pil_img)
    # And plot all of the rectangles in faces
    for x,y,w,h in faces:
        drawing.rectangle((x,y,x+w,y+h), outline="red")
        
    #Finally lets display this
    display(pil_img)
show_rects(face1)
In [11]:
#cropping all faces
#This will crop all the images and paste it in a contact sheet
from PIL import ImageFont
font = ImageFont.truetype("readonly/fanwood-webfont.ttf",20)
def im_crop_n_paste(name,len_d,path):
    cv_img=cv.imread(path+name)
    image=Image.open(path+name)
    faces = face_cascade.detectMultiScale(cv_img,1.35)
    im=[]
    
    maxh=0
    maxw=0
    for x,y,w,h in faces:
        image1=image.copy()
        im2=image1.crop((x,y,x+w,y+h))
        
        im.append(im2)
        if h>maxh and w>maxw:
            maxh,maxw=h,w
        #uncomment to display the set of images
        #display(im2)
    #im2 = Image.new('RGB', (maxw*len(im),30), (256,256,256))
    #im=[im2]+im
    x=0
    y=20
    
    if len(im)!=0:
        
        
        maxw=int(1752/len(im))
        contact_sheet=Image.new(image.mode, (1752,312))
        for img in im:
            # We paste the current image into the contact sheet
            # Note the images here are pasted in straight line
            hpercent=(312-20)/img.height
            img = img.resize((int(hpercent*img.width),292), Image.ANTIALIAS)
            contact_sheet.paste(img, (x, y) )
            # Now we update our X position. If it is going to be the width of the image, then we set it to 0
            # and update Y as well to point to the next "line" of the contact sheet.
            if x+img.width == contact_sheet.width:
                x=0
                y=y+img.height
            else:
                x=x+img.width
        contact_sheet = contact_sheet.resize((int(contact_sheet.width/2),int(contact_sheet.height/2) ))
        draw=ImageDraw.Draw(contact_sheet)
        draw.rectangle((0,20,876,0), fill = "white", outline =None)
        d = ImageDraw.Draw(contact_sheet)
        d.text((0,0), "Results found in file {}".format(name),font=font, fill="black")
        
    else:
        contact_sheet=Image.new(image.mode, (len_d,40), (256,256,256))
        d = ImageDraw.Draw(contact_sheet)
        d.text((0,0), "Results found in file {}".format(name),font=font, fill="black")
        d2 = ImageDraw.Draw(contact_sheet)
        d2.text((0,20), "But there were no faces in that file!",font=font, fill="black")    
    
    return contact_sheet
    
#test

a=im_crop_n_paste(training_list[0],876,"training/")
display(a)
print(a.size)
(876, 156)
In [ ]:
 
In [12]:
#Define the function which, for a given keyword, will provide a larger image comprising of all other images 
def face_from_key(keyword,namelist,path):
    im=[]
    h=0
    if type(namelist)!=str:
        for name in namelist:
            if keyword in text_extract(name,path):
                a=im_crop_n_paste(name,876,path)
                im.append(a)
                h+=a.height
    else:
        name=namelist
        if keyword in text_extract(name,path):
                a=im_crop_n_paste(name,876,path)
                im.append(a)
                h+=a.height
    first_image=im[0]
    contact_sheet=Image.new(first_image.mode, (first_image.width,h))
    x=0
    y=0

    for img in im:
        # Lets paste the current image into the contact sheet
        contact_sheet.paste(img, (x, y) )
        # Now we update our X position. If it is going to be the width of the image, then we set it to 0
        # and update Y as well to point to the next "line" of the contact sheet.
        if x+first_image.width == contact_sheet.width:
            x=0
            y=y+img.height
        else:
            x=x+first_image.width

    # resize and display the contact sheet
    #contact_sheet = contact_sheet.resize((contact_sheet.width),int(contact_sheet.height/2) ))        
    return contact_sheet        
In [13]:
training_list[0]
Out[13]:
'a-0.png'
In [14]:
display(face_from_key("Mark",training_list,'training/'))
In [ ]: